NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sample Complexity of Branch-length Estimation by Maximum Likelihood

Clancy, David; Lyu, Hanbaek; Roch, Sebastien (July 2025, International Conference on Machine Learning)

We consider the branch-length estimation problem on a bifurcating tree: a character evolves along the edges of a binary tree according to a two-state symmetric Markov process, and we seek to recover the edge transition probabilities from repeated observations at the leaves. This problem arises in phylogenetics, and is related to latent tree graphical model inference. In general, the log-likelihood function is non-concave and may admit many critical points. Nevertheless, simple coordinate maximization has been known to perform well in practice, defying the complexity of the likelihood landscape. In this work, we provide the first theoretical guarantee as to why this might be the case. We show that deep inside the Kesten-Stigum reconstruction regime, provided with polynomially many m samples (assuming the tree is balanced), there exists a universal parameter regime (independent of the size of the tree) where the log-likelihood function is strongly concave and smooth with high probability. On this high-probability likelihood landscape event, we show that the standard coordinate maximization algorithm converges exponentially fast to the maximum likelihood estimator, which is within O(1/sqrt(m)) from the true parameter, provided a sufficiently close initial point.
more » « less
Free, publicly-accessible full text available July 16, 2026
Sample Complexity of Branch-length Estimation by Maximum Likelihood

Clancy, David Jr; Lyu, Hanbaek; Roch, Sebastien (July 2025, Forty-Second International Conference on Machine Learning (ICML) 2025)

Free, publicly-accessible full text available July 13, 2026
Maximum Likelihood Estimation for Unrooted 3-Leaf Trees: An Analytic Solution for the CFN Model

https://doi.org/10.1007/s11538-024-01340-x

Hill, Max; Roch, Sebastien; Rodriguez, Jose Israel (September 2024, Bulletin of Mathematical Biology)

Abstract Maximum likelihood estimation is among the most widely-used methods for inferring phylogenetic trees from sequence data. This paper solves the problem of computing solutions to the maximum likelihood problem for 3-leaf trees under the 2-state symmetric mutation model (CFN model). Our main result is a closed-form solution to the maximum likelihood problem for unrooted 3-leaf trees, given generic data; this result characterizes all of the ways that a maximum likelihood estimate can fail to exist for generic data and provides theoretical validation for predictions made in Parks and Goldman (Syst Biol 63(5):798–811, 2014). Our proof makes use of both classical tools for studying group-based phylogenetic models such as Hadamard conjugation and reparameterization in terms of Fourier coordinates, as well as more recent results concerning the semi-algebraic constraints of the CFN model. To be able to put these into practice, we also give a complete characterization to test genericity.
more » « less
Full Text Available
Pairwise sequence alignment at arbitrarily large evolutionary distance

https://doi.org/10.1214/23-AAP2009

Legried, Brandon; Roch, Sebastien (June 2024, The Annals of Applied Probability)

Full Text Available
QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent

Tabatabaee, Yasamin; Roch, Sebastien; Warnow, Tandy (November 2023, Journal of computational biology)

Full Text Available
Expanding the Class of Global Objective Functions for Dissimilarity-Based Hierarchical Clustering

https://doi.org/10.1007/s00357-023-09447-x

Roch, Sebastien (September 2023, Journal of Classification)

Full Text Available
Inconsistency of Triplet-Based and Quartet-Based Species Tree Estimation under Intralocus Recombination

https://doi.org/10.1089/cmb.2022.0265

Hill, Max; Roch, Sebastien (November 2022, Journal of Computational Biology)

Full Text Available
Species tree estimation under joint modeling of coalescence and duplication: Sample complexity of quartet methods

https://doi.org/10.1214/22-AAP1799

Hill, Max; Legried, Brandon; Roch, Sebastien (December 2022, The Annals of Applied Probability)

Full Text Available
An impossibility result for phylogeny reconstruction from k-mer counts

https://doi.org/10.1214/22-AAP1805

Fan, Wai-Tong Louis; Legried, Brandon; Roch, Sebastien (December 2022, The Annals of Applied Probability)

Full Text Available
A stochastic Farris transform for genetic data under the multispecies coalescent with applications to data requirements

https://doi.org/10.1007/s00285-022-01731-5

Dasarathy, Gautam; Mossel, Elchanan; Nowak, Robert; Roch, Sebastien (April 2022, Journal of Mathematical Biology)

Full Text Available

« Prev Next »

Search for: All records